Skip to content

Bump databricks-sdk-java from 0.69.0 to 0.106.0#1464

Closed
msrathore-db wants to merge 1 commit into
databricks:mainfrom
msrathore-db:chore/sdk-upgrade-0.106
Closed

Bump databricks-sdk-java from 0.69.0 to 0.106.0#1464
msrathore-db wants to merge 1 commit into
databricks:mainfrom
msrathore-db:chore/sdk-upgrade-0.106

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

  • Bumps databricks-sdk-java from 0.69.0 → 0.106.0 (37 minor versions).
  • Removes the driver-side AgentDetector.detect() call in UserAgentManager.setUserAgent() to avoid duplicate agent/<name> User-Agent tokens. SDK 0.106 introduced native AI-coding-agent detection in com.databricks.sdk.core.UserAgent (new agentProvider() + 9-agent listKnownAgents()); layering the driver's own injection on top produced agent/×2 on every SDK-routed request.

Why this is safe to ship

Bytecode-diffed every driver-imported SDK class between 0.69 and 0.106. All 38 imports survive to 0.106 with compatible signatures (compile passes unchanged).

The agent/<name> collision was the only behavior-impacting delta on the driver's hot path. The other 0.69→0.106 deltas either don't reach driver code paths (driver bypasses SDK service impls via apiClient.execute() direct, so SDK's new X-Databricks-Org-Id auto-injection on StatementExecutionImpl/AppsImpl/etc. never fires for us), or only populate when the corresponding DatabricksConfig field is null (driver-set values continue to win — verified by reading resolveHostMetadata() source).

The bootstrap buildUserAgentForConnectorService path retains its own AgentDetector.detect() call because that UA is hand-built via StringBuilder and never goes through UserAgent.asString(). Mitm-verified agent/×1 on every wire request after the fix.

What changed

3 files, +2 / -4 lines total:

pom.xml                                                             | 2 +-
src/main/java/com/databricks/jdbc/common/util/UserAgentManager.java | 3 ---
NEXT_CHANGELOG.md                                                   | 1 +

Test plan

  • mvn install builds clean on SDK 0.106 with no other source changes
  • Mitm wire verified: agent/×2 on SDK 0.106 pre-fix → agent/×1 after, on both SEA and Thrift
  • PAT, M2M Databricks-OIDC, AAD SP — live PASS on pecotesting (Azure SPOG + Legacy) on SDK 0.106
  • U2M browser flow — live PASS on Azure SPOG, Azure Legacy, AWS workspace
  • U2M refresh-token cache reuse — live PASS on AWS (first run opens browser + caches; second run reuses cache, no browser, 6× faster)
  • Statement execution e2e: SELECT 1, NULL, multi-row 100, 10k-row multi-chunk, prepared statement, ResultSetMetaData — all PASS
  • DatabaseMetaData APIs: getCatalogs, getSchemas, getTableTypes, getDatabaseProductName — all PASS
  • Complex types round-trip: ARRAY, MAP, STRUCT, DECIMAL+DATE — all PASS
  • 50× connection lifecycle on both PAT and M2M — no leaks, no token-cache pollution
  • Mocked: CachedTokenSourceRefreshTest, AzureMsiMockTest, OidcDiscoveryTest, ErrorMappingTest, DefaultProfileResolutionTest, UserAgentTest — all PASS

Coverage gaps (not blocking — for transparency)

  • GCP workspace not live-tested: SDK's GoogleCredentialsCredentialsProvider/GoogleIdCredentialsProvider bytecode is essentially unchanged 0.69 → 0.106 (logger refactor only). Recommend canary on GCP customers.
  • Real Azure MSI flow: only mocked-IMDS test; needs an Azure VM for live coverage.

Pre-existing driver issues observed but NOT addressed by this PR

These reproduce identically on SDK 0.69 — not introduced by the bump. Each should be filed separately:

  1. EnableTokenFederation=1 default breaks vanilla U2M (workaround: set EnableTokenFederation=0). Wraps ExternalBrowserCredentialsProvider with DatabricksTokenFederationProvider which expects an external IdP token that vanilla U2M doesn't have.
  2. Cross-cloud federation rejected by the Cloud.AZURE-only check at ClientConfigurator.java:172 (blocks AAD external IdP → GCP/AWS workspace federation).
  3. OAuth-M2M scope is not pluggable via JDBC URL — driver's Auth_Scope applies only to U2M and JWT M2M paths, not OAuth M2M client_credentials. Breaks federation when the external IdP is Azure AAD because AAD requires <resource>/.default and the SDK sends all-apis from DatabricksConfig.getScopes() default.

NO_CHANGELOG=false

This pull request was AI-assisted by Isaac.

SDK 0.106 introduces native AI-coding-agent detection in
com.databricks.sdk.core.UserAgent (new agentProvider() field +
listKnownAgents()/lookupAgentProvider() methods + an agent/<name> block
in asString()). SDK 0.69 had none of these.

The driver's existing AgentDetector.detect() injection into
UserAgent.withOtherInfo("agent", ...) at UserAgentManager.setUserAgent()
now layers on top of the SDK's built-in detection. Both fire on the same
env-var contract, producing two agent/<name> tokens in every SDK-routed
request's User-Agent header (verified live via mitmproxy: agent/x2 on
both SEA and Thrift on SDK 0.106 vs agent/x1 on SDK 0.69).

Remove the driver-side injection in setUserAgent so the SDK is the
single source for the agent/<name> token (and gains coverage for two
agents the driver's list misses: Augment and Windsurf).

The hand-built buildUserAgentForConnectorService path keeps its own
AgentDetector.detect() call because that bootstrap UA is constructed
via StringBuilder and never goes through UserAgent.asString() —
no SDK-side injection happens there.

Audit performed by diffing every driver-imported SDK class between
0.69 and 0.106. The agent collision was the only behavior-impacting
delta on the driver's hot path. Other SDK additions (workspaceId /
accountId / discoveryUrl / tokenAudience auto-population in
DatabricksConfig.resolve(), CachedTokenSource dynamic stale period,
X-Databricks-Org-Id auto-injection in SDK service impls) either don't
reach driver code paths (driver bypasses SDK service impls via
apiClient.execute() direct) or only populate when the corresponding
config field is null (driver-set values win).

Live verified on SDK 0.106 against pecotesting (Azure SPOG + Legacy)
and PECOAWS workspaces:
  - PAT, M2M Databricks-OIDC, AAD SP, U2M browser, U2M refresh-token
    cache reuse — all PASS
  - Multi-chunk download (10k rows), prepared statements, complex
    types (ARRAY/MAP/STRUCT), metadata APIs, 50x connection lifecycle
    — all PASS
  - mitm-captured wire: agent/x1 on both SEA and Thrift after the
    one-line removal in setUserAgent

NO_CHANGELOG=false

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db
Copy link
Copy Markdown
Collaborator Author

Closing in favor of a new PR from the internal branch. The forked-PR CI runs Maven in offline-mode-only and can't download new dependencies — SDK 0.106.0 isn't in the runner cache, causing all build jobs to fail at dependency resolution before any code is tested. Reopening from the internal chore/sdk-upgrade-0.106 branch on databricks/databricks-jdbc to get a real CI run.

msrathore-db added a commit that referenced this pull request May 22, 2026
## Summary

- Bumps `databricks-sdk-java` from **0.69.0 → 0.106.0** (37 minor
versions).
- Removes the driver-side `AgentDetector.detect()` call in
`UserAgentManager.setUserAgent()` to avoid duplicate `agent/<name>`
User-Agent tokens. SDK 0.106 introduced native AI-coding-agent detection
in `com.databricks.sdk.core.UserAgent` (new `agentProvider()` + 9-agent
`listKnownAgents()`); layering the driver's own injection on top
produced `agent/×2` on every SDK-routed request.

> Note: this PR replaces #1464 which failed CI on dependency resolution
because forked-PR builds run Maven in offline-cache-only mode and SDK
0.106 isn't in the cache yet. Reopened from an internal branch so CI can
fetch the new SDK artifact from Maven Central.

## Why this is safe to ship

Bytecode-diffed every driver-imported SDK class between 0.69 and 0.106.
All 38 imports survive to 0.106 with compatible signatures (compile
passes unchanged).

The `agent/<name>` collision was the **only** behavior-impacting delta
on the driver's hot path. The other 0.69→0.106 deltas either don't reach
driver code paths (driver bypasses SDK service impls via
`apiClient.execute()` direct, so SDK's new `X-Databricks-Org-Id`
auto-injection on `StatementExecutionImpl`/`AppsImpl`/etc. never fires
for us), or only populate when the corresponding `DatabricksConfig`
field is `null` (driver-set values continue to win — verified by reading
`resolveHostMetadata()` source).

The bootstrap `buildUserAgentForConnectorService` path retains its own
`AgentDetector.detect()` call because that UA is hand-built via
`StringBuilder` and never goes through `UserAgent.asString()`.
Mitm-verified `agent/×1` on every wire request after the fix.

## What changed

**3 files, +2 / -4 lines total:**

```
pom.xml                                                             | 2 +-
src/main/java/com/databricks/jdbc/common/util/UserAgentManager.java | 3 ---
NEXT_CHANGELOG.md                                                   | 1 +
```

## Test plan

- [x] `mvn install` builds clean on SDK 0.106 with no other source
changes
- [x] Mitm wire verified: `agent/×2` on SDK 0.106 pre-fix → `agent/×1`
after, on both SEA and Thrift
- [x] PAT, M2M Databricks-OIDC, AAD SP — live PASS on pecotesting (Azure
SPOG + Legacy) on SDK 0.106
- [x] U2M browser flow — live PASS on Azure SPOG, Azure Legacy, AWS
workspace
- [x] U2M refresh-token cache reuse — live PASS on AWS (first run opens
browser + caches; second run reuses cache, no browser, 6× faster)
- [x] Statement execution e2e: SELECT 1, NULL, multi-row 100, **10k-row
multi-chunk**, prepared statement, ResultSetMetaData — all PASS
- [x] DatabaseMetaData APIs: `getCatalogs`, `getSchemas`,
`getTableTypes`, `getDatabaseProductName` — all PASS
- [x] Complex types round-trip: ARRAY, MAP, STRUCT, DECIMAL+DATE — all
PASS
- [x] 50× connection lifecycle on both PAT and M2M — no leaks, no
token-cache pollution
- [x] Mocked: `CachedTokenSourceRefreshTest`, `AzureMsiMockTest`,
`OidcDiscoveryTest`, `ErrorMappingTest`, `DefaultProfileResolutionTest`,
`UserAgentTest` — all PASS

## Coverage gaps (not blocking — for transparency)

- **GCP workspace not live-tested**: SDK's
`GoogleCredentialsCredentialsProvider`/`GoogleIdCredentialsProvider`
bytecode is essentially unchanged 0.69 → 0.106 (logger refactor only).
Recommend canary on GCP customers.
- **Real Azure MSI flow**: only mocked-IMDS test; needs an Azure VM for
live coverage.

## Pre-existing driver issues observed but NOT addressed by this PR

These reproduce identically on SDK 0.69 — not introduced by the bump.
Each should be filed separately:

1. `EnableTokenFederation=1` default breaks vanilla U2M (workaround: set
`EnableTokenFederation=0`). Wraps `ExternalBrowserCredentialsProvider`
with `DatabricksTokenFederationProvider` which expects an external IdP
token that vanilla U2M doesn't have.
2. Cross-cloud federation rejected by the `Cloud.AZURE`-only check at
`ClientConfigurator.java:172` (blocks AAD external IdP → GCP/AWS
workspace federation).
3. OAuth-M2M scope is not pluggable via JDBC URL — driver's `Auth_Scope`
applies only to U2M and JWT M2M paths, not OAuth M2M
`client_credentials`. Breaks federation when the external IdP is Azure
AAD because AAD requires `<resource>/.default` and the SDK sends
`all-apis` from `DatabricksConfig.getScopes()` default.

NO_CHANGELOG=false

This pull request was AI-assisted by Isaac.

---------

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant